295 research outputs found
Entropic Wasserstein Gradient Flows
This article details a novel numerical scheme to approximate gradient flows
for optimal transport (i.e. Wasserstein) metrics. These flows have proved
useful to tackle theoretically and numerically non-linear diffusion equations
that model for instance porous media or crowd evolutions. These gradient flows
define a suitable notion of weak solutions for these evolutions and they can be
approximated in a stable way using discrete flows. These discrete flows are
implicit Euler time stepping according to the Wasserstein metric. A bottleneck
of these approaches is the high computational load induced by the resolution of
each step. Indeed, this corresponds to the resolution of a convex optimization
problem involving a Wasserstein distance to the previous iterate. Following
several recent works on the approximation of Wasserstein distances, we consider
a discrete flow induced by an entropic regularization of the transportation
coupling. This entropic regularization allows one to trade the initial
Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to
deal with numerically. We show how KL proximal schemes, and in particular
Dykstra's algorithm, can be used to compute each step of the regularized flow.
The resulting algorithm is both fast, parallelizable and versatile, because it
only requires multiplications by a Gibbs kernel. On Euclidean domains
discretized on an uniform grid, this corresponds to a linear filtering (for
instance a Gaussian filtering when is the squared Euclidean distance) which
can be computed in nearly linear time. On more general domains, such as
(possibly non-convex) shapes or on manifolds discretized by a triangular mesh,
following a recently proposed numerical scheme for optimal transport, this
Gibbs kernel multiplication is approximated by a short-time heat diffusion
A Smoothed Dual Approach for Variational Wasserstein Problems
Variational problems that involve Wasserstein distances have been recently
proposed to summarize and learn from probability measures. Despite being
conceptually simple, such problems are computationally challenging because they
involve minimizing over quantities (Wasserstein distances) that are themselves
hard to compute. We show that the dual formulation of Wasserstein variational
problems introduced recently by Carlier et al. (2014) can be regularized using
an entropic smoothing, which leads to smooth, differentiable, convex
optimization problems that are simpler to implement and numerically more
stable. We illustrate the versatility of this approach by applying it to the
computation of Wasserstein barycenters and gradient flows of spacial
regularization functionals
Sparse Spikes Deconvolution on Thin Grids
This article analyzes the recovery performance of two popular finite
dimensional approximations of the sparse spikes deconvolution problem over
Radon measures. We examine in a unified framework both the L1 regularization
(often referred to as Lasso or Basis-Pursuit) and the Continuous Basis-Pursuit
(C-BP) methods. The Lasso is the de-facto standard for the sparse
regularization of inverse problems in imaging. It performs a nearest neighbor
interpolation of the spikes locations on the sampling grid. The C-BP method,
introduced by Ekanadham, Tranchina and Simoncelli, uses a linear interpolation
of the locations to perform a better approximation of the infinite-dimensional
optimization problem, for positive measures. We show that, in the small noise
regime, both methods estimate twice the number of spikes as the number of
original spikes. Indeed, we show that they both detect two neighboring spikes
around the locations of an original spikes. These results for deconvolution
problems are based on an abstract analysis of the so-called extended support of
the solutions of L1-type problems (including as special cases the Lasso and
C-BP for deconvolution), which are of an independent interest. They precisely
characterize the support of the solutions when the noise is small and the
regularization parameter is selected accordingly. We illustrate these findings
to analyze for the first time the support instability of compressed sensing
recovery when the number of measurements is below the critical limit (well
documented in the literature) where the support is provably stable
Compressive Wave Computation
This paper considers large-scale simulations of wave propagation phenomena.
We argue that it is possible to accurately compute a wavefield by decomposing
it onto a largely incomplete set of eigenfunctions of the Helmholtz operator,
chosen at random, and that this provides a natural way of parallelizing wave
simulations for memory-intensive applications.
This paper shows that L1-Helmholtz recovery makes sense for wave computation,
and identifies a regime in which it is provably effective: the one-dimensional
wave equation with coefficients of small bounded variation. Under suitable
assumptions we show that the number of eigenfunctions needed to evolve a sparse
wavefield defined on N points, accurately with very high probability, is
bounded by C log(N) log(log(N)), where C is related to the desired accuracy and
can be made to grow at a much slower rate than N when the solution is sparse.
The PDE estimates that underlie this result are new to the authors' knowledge
and may be of independent mathematical interest; they include an L1 estimate
for the wave equation, an estimate of extension of eigenfunctions, and a bound
for eigenvalue gaps in Sturm-Liouville problems.
Numerical examples are presented in one spatial dimension and show that as
few as 10 percents of all eigenfunctions can suffice for accurate results.
Finally, we argue that the compressive viewpoint suggests a competitive
parallel algorithm for an adjoint-state inversion method in reflection
seismology.Comment: 45 pages, 4 figure
Local Linear Convergence Analysis of Primal-Dual Splitting Methods
In this paper, we study the local linear convergence properties of a
versatile class of Primal-Dual splitting methods for minimizing composite
non-smooth convex optimization problems. Under the assumption that the
non-smooth components of the problem are partly smooth relative to smooth
manifolds, we present a unified local convergence analysis framework for these
methods. More precisely, in our framework we first show that (i) the sequences
generated by Primal-Dual splitting methods identify a pair of primal and dual
smooth manifolds in a finite number of iterations, and then (ii) enter a local
linear convergence regime, which is characterized based on the structure of the
underlying active smooth manifolds. We also show how our results for
Primal-Dual splitting can be specialized to cover existing ones on
Forward-Backward splitting and Douglas-Rachford splitting/ADMM (alternating
direction methods of multipliers). Moreover, based on these obtained local
convergence analysis result, several practical acceleration techniques are
discussed. To exemplify the usefulness of the obtained result, we consider
several concrete numerical experiments arising from fields including
signal/image processing, inverse problems and machine learning, etc. The
demonstration not only verifies the local linear convergence behaviour of
Primal-Dual splitting methods, but also the insights on how to accelerate them
in practice
Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two
probability distributions supported on two distinct low-dimensional manifolds
living in a much higher-dimensional space) is a crucial problem arising in the
estimation of generative models for high-dimensional observations such as those
arising in computer vision or natural language. It is known that optimal
transport metrics can represent a cure for this problem, since they were
specifically designed as an alternative to information divergences to handle
such problematic scenarios. Unfortunately, training generative machines using
OT raises formidable computational and statistical challenges, because of (i)
the computational burden of evaluating OT losses, (ii) the instability and lack
of smoothness of these losses, (iii) the difficulty to estimate robustly these
losses and their gradients in high dimension. This paper presents the first
tractable computational method to train large scale generative models using an
optimal transport loss, and tackles these three issues by relying on two key
ideas: (a) entropic smoothing, which turns the original OT loss into one that
can be computed using Sinkhorn fixed point iterations; (b) algorithmic
(automatic) differentiation of these iterations. These two approximations
result in a robust and differentiable approximation of the OT loss with
streamlined GPU execution. Entropic smoothing generates a family of losses
interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus
allowing to find a sweet spot leveraging the geometry of OT and the favorable
high-dimensional sample complexity of MMD which comes with unbiased gradient
estimates. The resulting computational architecture complements nicely standard
deep network generative models by a stack of extra layers implementing the loss
function
- …